# Building SerDes Models in Julia, pt2 - Simulation Framework



For the uninitiated, SerDes systems have evolved to become extremely complicated as the transmission speed increases (>100Gbps) and transmission medium changes (from copper to fiber). The challenge lies in the intersection between continuous- and discrete-time signal processing. One example is when clocks are used to sample signals (either implicitly with a data slicer or explicitly with data converters nowadays). Complex digital adaptation and clock-data recovery (CDR) algorithms become more interwined because all loops are running at the same time. How to effectively model and simulate all these components together has always been an interesting topic.

While many different analysis methods exist, including frequency and statistical analysis, time domain results remain the final sign-off. Therefore, this framework focuses on transient simulation first and hopefully other analysis tools will be added in the near future.

# Key Simulation Parameters - time step and block size

In order to simulate time domain waveforms, we resort to arrays (or vectors) to represent such waveforms. Now in order to build an *efficient* model, we first need to decide the granuarity of a simulation step, as well as the block size to simulate in batch. If we want to simulate a sequence of random bits, they need to be converted to continuous time waveform first with a finite time step.

Then the step size choice directly impacts the accuracy and the speed of your sim. Allow me to illustrate

## Simulation time step

```
1 tui = 1/10e9; # Unit Interval time for 10Gbps data
1 osr1 = 4; # Oversampling ratio per symbol, i.e. # of points to simulate per symbol
1 osr2 = 32;
1 dt1 = tui/osr1; # Simulation time step
1 dt2 = tui/osr2;
1 bits = [0,1,0,1,0,0,1,1,1]; #arbitrary bits; use bitrand(N) to generate random bits
1 tt1, Vbits1 = gen_wvfm(bits, tui=tui, osr=osr1);
1 tt2, Vbits2 = gen_wvfm(bits, tui=tui, osr=osr2);
```

When osr is higher, the resultant waveform looks more like perfect square waveforms. Lower osr results in waveform distortion.



Here is what happens when we pass this waveform through a "channel" by means of convolution. With a larger time step, the convolution finishes faster because the array size is obviously smaller. However, there are noticeable kinks in the waveform, which might lead to simulation inaccuracy later. The accuracy increase also plateaus once the curve looks smooth enough.

Note that the increase in computation time doesn't increase linearly w.r.t array size becasue the convolution method in DSP.jl package uses FFT. Therefore, it's often more beneficial to just pick a bigger osr because Nlog(N) is cool!

```
1 bw_ir = 8e9;

1 tlen_ir = 20*tui; #20UI long impulse response

1 ir1 = gen_ir_rc(dt1, bw_ir, tlen_ir);

1 Qtime Vout1 = conv(ir1, Vbits1)*dt1;

0.002972 seconds (19 allocations: 7.234 KiB)

1 ir2 = gen_ir_rc(dt2, bw_ir, tlen_ir);

1 Qtime Vout2 = conv(ir2, Vbits2)*dt2;

0.000477 seconds (19 allocations: 46.016 KiB)
```



# Simulation (sub-)block size

Once the simulation time step is chosen, the next question is the batch size of the signal we want to process. The extreme examples would be when simulating 1 million bits, we can either generate and process 1 bit at a time or 1 million at once. This choice will equally impact simulation speed and accuracy. Like all things in engineering, it's always a spectrum and our framework should give the designer the freedom to adjust these parameters.

Here we define two types of blocks: an outer loop block and an inner loop sub-block. Each block will contain different modules corresponding to different circuits in your system. Let's look at the example block diagram below.

#### Sim Block



The outter loop block is suitable for circuits that are "quasi-static" and can process many bits at once. For example, a transmitter can generate waveforms for a large number of bits up front instead of simulating multiple short sequences, since we don't expect the tranmister parameters to change during simulation (e.g. FIR coefficients and bandwidth). This will save time by doing repetitve convolutions (as we have seen before, longer convolutions become faster due to FFT algorithm).

The inner loop sub-block is more suitable for circuits that require more accuracy and have more complicated loop dynamics. For example, the adaptation and CDR loops normally operate on a parallel bus of data and error slicer outputs. This vector then effectively determine the sub-block size because we want to capture the real dynamics of the feedback loops. Another way to think about this is that we implicitly created a "digital clock" through this sub-block loop, i.e. each loop iteration is a digital clock cycle.



The figure above illustrate the relationships between the big block and sub-blocks. We first batch process a long bit sequence, then iterate over the waveform through the sub-blocks.

Note that although this picture implies that the block size has to be an integer multiple of sub-block size, but in reality it could be extended through an elastic buffer layer in between. The current code only assumes integer relationship between these two numbers, but the elastic buffer approach would be the most general for modeling say frequency offsets between TX and RX (will be addressed in future development)

# The run\_sim() skeleton

With this in mind, we can start with a run\_sim() function pseudo-code already as

run\_sim (generic function with 1 method)

```
1 function run_sim(params)
       @unpack Nblks, Nsubblks, blk_size, subblk_size = params
3
       @unpack bist_params, tx_params, rx_params = params
4
       for n = 1:Nblks
5
6
            # run batch processing blocks here
 7
           # e.g.
8
           # S = bist_gen(bist_params)
            \# Vtx = tx\_drv(tx\_params, Vtx)
9
10
           for m = 1:Nsubblks
11
                # run shorter loops here
12
13
                # e.g.
                \# \Phi = clkgen(pi\_code)
14
15
                # Ssub = sampler(Vtx, \Phi)
                # dvec = slicers(Ssub)
16
17
                # ...
18
                # record waveforms here
19
20
           end
21
22
           # bist_check(bist_params, data_rcvd)
23
24
       end
25
26 end
```

**That's it!** This is pretty much the big picture of the framework, and now it's a matter of adding in the functional model of each circuit module **AS LONG AS** they satisfy this osr, blk\_size, subblk\_size framework. This usually means dealing with boundary conditions internally, remembering states from the previous (sub-)block, etc.

## World's best SerDes

Let's build the world's most ideal SerDes using this framework. What we need are some BIST functions that generates and checks PRBS, an ideal transmitter and slicers. Let's start with the bist\_prbs\_gen from our <u>first notebook</u>

bist\_prbs\_gen (generic function with 1 method)

```
1 function bist_prbs_gen(;poly, inv, Nsym, seed)
       seq = Vector{Bool}(undef,Nsym)
3
       for n = 1:Nsym
4
            seq[n] = inv
           for p in poly
5
6
                seq[n] \underline{\lor} = seed[p]
 7
           end
           seed .= [seq[n]; seed[1:end-1]]
8
9
       end
10
       return seq, seed
11 end
```

And here is an example ber\_check\_prbs function

ber\_check\_prbs (generic function with 1 method)

```
1 function ber_check_prbs(rcvd_bits; poly, inv, seed, lock_status, lock_cnt,
   lock_threshold, ber_err_cnt, ber_tot_cnt)
 3
       nbits_rcvd = lastindex(rcvd_bits)
 4
 5
       if lock_status #if prbs already locked, use prbs_gen for reference bits
 6
           ref_bits, seed = bist_prbs_gen(poly=poly, inv=inv,
 7
                                             Nsym=nbits_rcvd, seed=seed)
           ber_err_cnt += sum(rcvd_bits .y ref_bits)
 8
           ber_tot_cnt += nbits_rcvd
9
10
       else # if not locked yet, use received bits as seed
11
12
           for n = 1:nbits_rcvd
13
                brcv = rcvd_bits[n]
                btst = inv
14
15
                for p in poly
                    btst \underline{\vee}= seed[p]
16
                end
17
                seed .= [brcv; seed[1:end-1]]
18
19
20
                #need consecutive non-error for lock. reset when error happens
                lock_cnt = (btst == brcv) ? lock_cnt+1 : 0
21
22
23
                if lock_cnt == lock_threshold
24
                    lock_status = true
                    println("prbs locked")
25
26
                    #run prbs till end
27
                    ref_bits, seed = bist_prbs_gen(poly=poly, inv=inv,
                                             Nsym=nbits_rcvd-n, seed=seed)
28
29
                    ber_err_cnt += sum(rcvd_bits[n+1:end] .v ref_bits)
30
                    ber_tot_cnt += nbits_rcvd - n
31
32
                    break
33
                end
34
           end
35
       end
37
       return seed, lock_status, lock_cnt, ber_err_cnt, ber_tot_cnt
38 end
```

We can verify these two functions with direct feedthrough

```
1 poly = [28,31]; #PRBS31 polynomial
1 inv = false;
1 Nsym = Int(10e6); #Try changing number of bits
1 gen_seed = ones(Bool, 31);
1 chk_seed = zeros(Bool, 31);
```

```
1 bits_test, gen_seed1 = bist_prbs_gen(poly=poly, inv=inv, Nsym=Nsym, seed=gen_seed);
 1 chk_seed1, lock_status1, lock_cnt1, ber_err_cnt1, ber_tot_cnt1 =
 2
        ber_check_prbs(bits_test; poly=poly, inv=inv, seed=chk_seed,
 3
                        lock_status=false, lock_cnt=0, lock_threshold=256,
 4
                        ber_err_cnt=0, ber_tot_cnt=0);
    prbs locked
Try printing lock status, lock cnt, etc. below
 1 println(ber_err_cnt1)
We can now artificially add bit errors to verify our checker
 1 ber_target = 1e-4; #play around with this number and Nsym
 1 err_loc = rand(Uniform(0,1.0),Nsym).< ber_target;</pre>
 2 #Uniform is from Distributions package
 1 bits_werr = bits_test .v err_loc; #flip the bits at error locations
 1 chk_seed2, lock_status2, lock_cnt2, ber_err_cnt2, ber_tot_cnt2 =
 2
        ber_check_prbs(bits_werr; poly=poly, inv=inv, seed=chk_seed,
                        lock_status=false, lock_cnt=0, lock_threshold=256,
 3
                        ber_err_cnt=0, ber_tot_cnt=0);
 4
    prbs locked
 1 println("BER ≈ " * string(ber_err_cnt2/ber_tot_cnt2))
    BER \approx 9.610275814915889e-5
```

Let's now build an ideal transmitter and receiver. The ideal transmitter will simply take the bit sequence and oversample according to osr to create the time domain waveform. Our ideal receiver will then take samples from the waveform with direct indexing. Because the waveform is perfect, it doesn't really matter which index we take the samples at.

tx\_drv\_top (generic function with 1 method)

```
function tx_drv_top(bits; osr, swing)

# just like the gen_wvfm helper function, but no need for the time vector

# change output waveform to analog voltage in +/- swing/2

# the dots are needed to do elementwise operation on the bits vector

return kron(sign.(bits.-0.5), swing/2*ones(osr))

end
```

rx\_slice\_top (generic function with 1 method)

```
function rx_slice_top(Vin; sample_idx, osr, slice_lvl)

# sample_idx is a number between 1 and osr, will wrap around for now

Vsamples = Vin[mod(Int(sample_idx)-1, osr)+1:osr:end]

return Vsamples .> slice_lvl

end
```

Now let's write a simple run\_sim\_ideal() function to plug all these functions in. In this example, we don't have a sub-block loop, which would make more sense when CDR and adaptation is introduced.

run\_sim\_ideal (generic function with 1 method)

```
1 function run_sim_ideal(osr, blk_size, nsym_tot)
2
3
       # initialize parameters
       gen_seed = ones(Bool, 31)
4
5
       chk_seed = zeros(Bool, 31)
6
       poly = [28,31]
       inv = false
 7
8
9
       lock_status = false
10
       lock\_cnt = 0
11
       ber_err_cnt = 0
       ber_tot_cnt = 0
12
13
14
       ber_target = 0e-5;
15
16
       Vtx = zeros(blk_size*osr)
17
       #round nblk to the nearest total # of symbols
18
19
       nblk = Int(round(nsym_tot/blk_size))
20
21
       # main loop
       for n = 1:nblk
22
23
           #generate bits
           bits, gen_seed = bist_prbs_gen(poly=poly, inv=inv,
24
25
                                             Nsym=blk_size, seed=gen_seed);
26
27
           #generate TX waveform
28
           Vtx = tx_drv_top(bits, osr=osr, swing=1)
29
30
           #sample TX waveform
31
           bits_rcvd = rx_slice_top(Vtx, sample_idx=osr/2, osr=osr, slice_lvl=0)
32
           #add random bit error
33
34
           err_loc = rand(Uniform(0,1.0),blk_size).< ber_target;</pre>
35
           bits_werr = bits_rcvd . v err_loc;
36
37
           #check error
           chk_seed, lock_status, lock_cnt, ber_err_cnt, ber_tot_cnt =
38
39
                ber_check_prbs(bits_werr; poly=poly, inv=inv, seed=chk_seed,
40
                        lock_status=lock_status, lock_cnt=lock_cnt, lock_threshold=256,
41
                        ber_err_cnt=ber_err_cnt, ber_tot_cnt=ber_tot_cnt);
42
           #record waveforms here if needed
43
44
45
       end
46
47
       println("Sim done")
       println("BER: " * string(ber_err_cnt/ber_tot_cnt))
48
49
50
       return Vtx
51 end
```

```
1 @time Vtx = run_sim_ideal(32, 128, Int(1e6));
```

2 #play around with the osr and blk\_size parameters and see how the run time is impacted

```
prbs locked
Sim done
BER: 0.0
0.563659 seconds (4.13 M allocations: 585.194 MiB, 15.79% gc time)
```



Congratulations! We have just built your first SerDes model. I admit that this model isn't that useful, and the code itself is not looking that pretty. Trust me, it will get better  $\odot$ .

The main issue here is that we have quite a few *states* that needs tracking (e.g. seed, ber\_cnt, etc.), and spelling them out explicitly will get out of hand pretty quickly. In the next notebook, we will introduce **struct** in Julia, which is a lighter weight data structure than "classes" for those who are familiar with object oriented programming.

As a preview, we will use structs to store relevant parameters and states for the corresponding circuit module, and pass the structs around in functions. The TX driver function can then be written as the following

tx\_drv\_top (generic function with 2 methods)

```
function tx_drv_top(tx, bits)
    @unpack osr, blk_size, swing = tx #grab the parameters we need from the tx struct
    return kron(sign.(bits.-0.5), swing/2*ones(osr))
end
```

### Julia tips

Oops! I named the function tx\_drv\_top again, but with different arguments. No worries! Julia implements something called <u>multiple dispatch</u> (learn more <u>here</u>). A function can take arguments of different types, and share the same name. This would be particularly helpful when we want to optimize function performance and/or allow different usage cases (e.g. explicit arguments vs. struct). We won't go too much into the details of how to use multiple dispatch and typing to optimize Julia performance, but it will be relevant for improving the framework in the long run.

tx\_drv\_top! (generic function with 1 method)

```
function tx_drv_top!(tx, bits)
    #grab the parameters we need from the tx struct
    @unpack osr, blk_size, swing = tx

#store output directly into the struct
    tx.Vout .= kron(sign.(bits.-0.5), swing/2*ones(osr))
end
```

## Julia tips

Julia uses a convention of putting "!" in the function name to denote that it's a mutating function. This means that the argument passed into the function will be directly changed and no explicit values will be returned. Depending on your programming philosophy, there are pros and cons to this approach. Directly mutating the arguments allows more efficient memory usage (i.e., no need to allocate new variables every time a function is called), but one needs to be careful about the sequence of function calls and keep track of the states. This simulation framework choses to mainly to use mutating functions, which we will cover more in the next notebooks.

In the next notebook, we will focus more on the data structures and how the code base will be structured. We will start to build a more complete TX module as we discuss more Julia details.

# Helper function section

gen\_wvfm (generic function with 1 method)

1 function gen\_wvfm(bits; tui, osr)
2 #helper function used to generate oversampled waveform

3 
4 Vbits = kron(bits, ones(osr)) #Kronecker product to create oversampled waveform

5 dt = tui/osr
6 tt = 0:dt:(length(Vbits)-1)\*dt

7 
8 return tt, Vbits
9 end

## Julia tips

The arguments in the gen\_wvfm function are separated by a semicolon. This syntax in Julia means that anything after the semicolon is a keyword argument. An explicit "kw = something" statement is needed during function calls. Having keyword argument allows a function to have default parameters as well.

gen\_ir\_rc (generic function with 1 method)

```
1 function gen_ir_rc(dt,bw,t_len)
       #helper function that directly calculates a first order RC response, normalized
       to the time step
       tt = [0:dt:t_len-dt;]
       #checkout the intuitive symbols!
5
       \omega = (2*\pi*bw)
6
 7
       ir = \omega * exp.(-tt*\omega)
8
       ir .= ir/sum(ir*dt)
9
10
       return ir
11 end
```

## Julia tips

Unicode is supported in Julia, thus one can write cleaner code using commonly known symbols like in the gen\_ir\_rc function.

```
1 using Makie, CairoMakie
```

```
1 using DSP, Random, Distributions, UnPack, BenchmarkTools
```